-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-796] Support for globally disable thread-local cache in the shared PooledByteBufAllocator #1716
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1716 +/- ##
==========================================
- Coverage 46.86% 46.68% -0.17%
==========================================
Files 162 162
Lines 10027 10088 +61
Branches 923 929 +6
==========================================
+ Hits 4698 4709 +11
- Misses 5019 5071 +52
+ Partials 310 308 -2
... and 12 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Will disable cache hurt performance and increase cpu usage? |
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Let's revise this PR, it's beneficial to release the direct memory. I hit worker memory issues recently, the worker triggers trim action, then the disk buffer become empty but direct memory does not reach the threshold, causing the worker being a PAUSE state forever.
|
@@ -42,7 +42,8 @@ | |||
|
|||
/** Utilities for creating various Netty constructs based on whether we're using EPOLL or NIO. */ | |||
public class NettyUtils { | |||
private static volatile PooledByteBufAllocator _allocator; | |||
private static final PooledByteBufAllocator[] _sharedPooledByteBufAllocator = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about splitting it into 2 variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's derived from Spark and I prefer to keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, both are fine.
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
Outdated
Show resolved
Hide resolved
…scala Co-authored-by: Cheng Pan <[email protected]>
I think it's good to go as it does not change the default behavior. Generally disabling cache would hurt performance, any chance to provide a benchmark with cache enabled/disabled? e.g. terasort https://github.com/pan3793/spark-terasort |
buildConf("celeborn.network.memory.allocator.allowCache") | ||
.categories("network") | ||
.internal | ||
.version("0.4.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 0.3.1, if @waitinfuture not oppose to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No opposition:) I think we should merge this PR to 0.3.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The graph looks great!
Thanks! Merging to main/0.3 |
… shared PooledByteBufAllocator ### What changes were proposed in this pull request? As title ### Why are the changes needed? As title ### Does this PR introduce _any_ user-facing change? Yes, the thread local cache of shared `PooledByteBufAllocator` can be disabled by setting `celeborn.network.memory.allocator.allowCache=false` ### How was this patch tested? Pass GA Closes #1716 from cfmcgrady/allow-cache. Authored-by: Fu Chen <[email protected]> Signed-off-by: zky.zhoukeyong <[email protected]> (cherry picked from commit 6f1bb41) Signed-off-by: zky.zhoukeyong <[email protected]>
…lt to false ### What changes were proposed in this pull request? As title ### Why are the changes needed? I tested 1.1T and 3.3T shuffle, as well as 3T TPCDS with thread cache on and off in the shared PooledByteBufAllocator and find no difference: | Benchmark | Cache On | Cache Off| | -------- | ------- |------- | |1.1T Shuffle| 3.7min/1.9min |3.7min/1.9min| | 3.3T Shuffle| 12min/6.7min |12min/6.2min| | 3T TPCDS | 2645s |2644s| And since the configuration has a big influence to the direct memory usage, see #1716 , it's very necessary to set the default value to false. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes #1817 from waitinfuture/897. Authored-by: zky.zhoukeyong <[email protected]> Signed-off-by: zky.zhoukeyong <[email protected]>
…lt to false ### What changes were proposed in this pull request? As title ### Why are the changes needed? I tested 1.1T and 3.3T shuffle, as well as 3T TPCDS with thread cache on and off in the shared PooledByteBufAllocator and find no difference: | Benchmark | Cache On | Cache Off| | -------- | ------- |------- | |1.1T Shuffle| 3.7min/1.9min |3.7min/1.9min| | 3.3T Shuffle| 12min/6.7min |12min/6.2min| | 3T TPCDS | 2645s |2644s| And since the configuration has a big influence to the direct memory usage, see #1716 , it's very necessary to set the default value to false. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes #1817 from waitinfuture/897. Authored-by: zky.zhoukeyong <[email protected]> Signed-off-by: zky.zhoukeyong <[email protected]> (cherry picked from commit 57fdbf0) Signed-off-by: zky.zhoukeyong <[email protected]>
…lt to false ### What changes were proposed in this pull request? As title ### Why are the changes needed? I tested 1.1T and 3.3T shuffle, as well as 3T TPCDS with thread cache on and off in the shared PooledByteBufAllocator and find no difference: | Benchmark | Cache On | Cache Off| | -------- | ------- |------- | |1.1T Shuffle| 3.7min/1.9min |3.7min/1.9min| | 3.3T Shuffle| 12min/6.7min |12min/6.2min| | 3T TPCDS | 2645s |2644s| And since the configuration has a big influence to the direct memory usage, see apache/celeborn#1716 , it's very necessary to set the default value to false. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes #1817 from waitinfuture/897. Authored-by: zky.zhoukeyong <[email protected]> Signed-off-by: zky.zhoukeyong <[email protected]>
What changes were proposed in this pull request?
As title
Why are the changes needed?
As title
Does this PR introduce any user-facing change?
Yes, the thread local cache of shared
PooledByteBufAllocator
can be disabled by settingceleborn.network.memory.allocator.allowCache=false
How was this patch tested?
Pass GA